Styling Your Plots

GVPT399F: Power, Politics, and Data

Data visualisation

We will use data visualization to answer the following question:

Do cars with big engines use more fuel than cars with small engines?

Add useful titles and labels

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(colour = class)) + 
  geom_smooth(method = "lm") + 
  labs(
    title = "Engine displacement and highway miles per gallon",
    subtitle = "Values for seven different classes of cars",
    x = "Engine displacement (L)",
    y = "Highway miles per gallon"
  ) + 
  scale_color_colorblind()

Add useful titles and labels

Flexible visualization

You can use visual elements to communicate your findings in engaging ways.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class == "2seater"))

Changing the look of your plots

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), colour = "red")

EXERCISE

What’s gone wrong with this code? Why are the points not blue?

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

ANSWER

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy),
             color = "blue")

EXERCISE

  1. Name a categorical variable in mpg. Name a continuous one.


  1. Map a continuous variable to color. How does this aesthetics behave differently for categorical vs. continuous variables?


  1. Map class to the shape aesthetic. What does the warning tell you?

ANSWERS

Categorical

  • manufacturer
  • model
  • trans
  • drv
  • fl
  • class

Continuous

  • displ
  • year
  • cyl
  • cty
  • hwy

ANSWERS

ggplot(mpg, aes(x = displ, y = hwy, colour = cyl)) + 
  geom_point()

ANSWERS

Let’s clean our graph up

Less is more when it comes to data visualization.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(colour = class)) + 
  geom_smooth(method = "lm") + 
  theme_minimal() + 
  labs(
    title = "Engine displacement and highway miles per gallon",
    subtitle = "Values for seven different classes of cars",
    x = "Engine displacement (L)",
    y = "Highway miles per gallon"
  ) + 
  scale_color_colorblind()

Let’s clean this up

EXERCISE

Head over to the ggplot documentation and find your favorite preset theme:


https://ggplot2.tidyverse.org/reference/ggtheme.html

Creating your own theme

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(colour = class)) + 
  geom_smooth(method = "lm") + 
  theme(
    legend.position = "bottom",
    panel.grid = element_blank(),
    panel.background = element_blank(),
    plot.title.position = "plot",
    plot.title = element_text(face = "bold")
  ) + 
  labs(
    title = "Engine displacement and highway miles per gallon",
    subtitle = "Values for seven different classes of cars",
    x = "Engine displacement (L)",
    y = "Highway miles per gallon"
  ) + 
  scale_color_colorblind()

Creating your own theme

The before shot

EXERCISE

Customize the last plot you made using the theme() argument.

Working with categorical data

We often want to explore patterns in categorical (or discrete) data. We need new tools to do this.


select(mpg, manufacturer, model, drv)
# A tibble: 234 × 3
   manufacturer model      drv  
   <chr>        <chr>      <chr>
 1 audi         a4         f    
 2 audi         a4         f    
 3 audi         a4         f    
 4 audi         a4         f    
 5 audi         a4         f    
 6 audi         a4         f    
 7 audi         a4         f    
 8 audi         a4 quattro 4    
 9 audi         a4 quattro 4    
10 audi         a4 quattro 4    
# ℹ 224 more rows

Visualizing distributions

ggplot(mpg, aes(x = drv)) + 
  geom_bar()

Visualizing distributions

Reorder in relation to frequency

ggplot(mpg, aes(x = fct_infreq(drv))) +
  geom_bar()

Visualizing numeric variables

ggplot(mpg, aes(x = hwy)) +
  geom_histogram()

Visualizing numeric variables

ggplot(mpg, aes(x = hwy)) +
  geom_density()

Visualizing numeric variables

ggplot(mpg, aes(x = hwy, colour = drv)) +
  geom_density()

Visualizing numeric variables

ggplot(mpg, aes(x = hwy, colour = drv, fill = drv)) +
  geom_density(alpha = 0.5)

Summary

This session you:

  1. Set up your data science tools

  2. Plotted complex data in an engaging way

  3. Discovered interesting relationships in the data

  4. Connected these relationships or trends to your expectations (or hypotheses about the data)